Value Function Approximation and Policy Performance

ثبت نشده

چکیده

Fig. 1 gives a geometric interpretation of value function approximation. We may think of J � as a vector in ∗; by considering approximations of the form J̃ = �r, we restrict attention to the hyperplane J = �r in the same space. Given a norm ≤ · ≤ (e.g., the Euclidean norm), an ideal value function approximation algorithm would choose r minimizing ≤J −�r≤; in other words, it would find the projection �r of J onto the hyperplane. Note that ≤J −�r≤ is a natural measure for the quality of the approximation architecture, since it is the best approximation error that can be attained by any algorithm given the choice of �. Algorithms for value function approximation found in the literature do not compute the projection �r , since this is an intractable problem. Building on the knowledge that J � satisfies Bellman’s equation, value function approximation typically involves adaptation of exact dynamic programming algorithms. For in stance, drawing inspiration from value iteration, one might consider the following approximate value iteration algorithm: �rk+1 = �T�rk, where � is a projection operator which maps T�rk back onto the hyperplane �r. Faced with the impossibility of computing the “best approximation” �r , a relevant question for any value function approximation algorithm A generating an approximation �rA is how large ≤J − �rA≤ is in comparison with ≤J − �r ≤. In particular, it would be desirable that, if the approximation architecture

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Debt Collection Industry: Machine Learning Approach

Businesses are increasingly interested in how big data, artificial intelligence, machine learning, and predictive analytics can be used to increase revenue, lower costs, and improve their business processes. In this paper, we describe how we have developed a data-driven machine learning method to optimize the collection process for a debt collection agency. Precisely speaking, we create a frame...

متن کامل

Rates of Convergence of Performance Gradient Estimates Using Function Approximation and Bias in Reinforcement Learning

We address two open theoretical questions in Policy Gradient Reinforcement Learning. The first concerns the efficacy of using function approximation to represent the state action value function, Q. Theory is presented showing that linear function approximation representations of Q can degrade the rate of convergence of performance gradient estimates by a factor of O(ML) relative to when no func...

متن کامل

Localizing Policy Gradient Estimates to Action Transitions

Function Approximation (FA) representations of the state-action value function Q have been proposed in order to reduce variance in performance gradients estimates, and thereby improve performance of Policy Gradient (PG) reinforcement learning in large continuous domains (e.g., the PIFA algorithm of Sutton et al. (in press)). We show empirically that although PIFA converges significantly faster ...

متن کامل

Localizing Policy Gradient Estimates to Action

متن کامل

Verification of an Evolutionary-based Wavelet Neural Network Model for Nonlinear Function Approximation

Nonlinear function approximation is one of the most important tasks in system analysis and identification. Several models have been presented to achieve an accurate approximation on nonlinear mathematics functions. However, the majority of the models are specific to certain problems and systems. In this paper, an evolutionary-based wavelet neural network model is proposed for structure definiti...

متن کامل

Minimizing a General Penalty Function on a Single Machine via Developing Approximation Algorithms and FPTASs

This paper addresses the Tardy/Lost penalty minimization on a single machine. According to this penalty criterion, if the tardiness of a job exceeds a predefined value, the job will be lost and penalized by a fixed value. Besides its application in real world problems, Tardy/Lost measure is a general form for popular objective functions like weighted tardiness, late work and tardiness with reje...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2004

Value Function Approximation and Policy Performance

ثبت نشده

چکیده

منابع مشابه

Debt Collection Industry: Machine Learning Approach

Rates of Convergence of Performance Gradient Estimates Using Function Approximation and Bias in Reinforcement Learning

Localizing Policy Gradient Estimates to Action Transitions

Localizing Policy Gradient Estimates to Action

Verification of an Evolutionary-based Wavelet Neural Network Model for Nonlinear Function Approximation

Minimizing a General Penalty Function on a Single Machine via Developing Approximation Algorithms and FPTASs

عنوان ژورنال:

اشتراک گذاری